# Multilingual Image Understanding
Llama 4 Scout 17B 4E Instruct
Llama 4 Scout is a 17-billion-parameter multimodal model with a Mixture of Experts (MoE) architecture, introduced by Meta. It supports 12 languages and image understanding, featuring a topk=4 expert dynamic fusion mechanism.
Large Language Model
Transformers Supports Multiple Languages

L
shadowlilac
53
1
Llama 4 Scout 17B 16E Unsloth Bnb 4bit
Other
Llama 4 Scout is a multimodal mixture-of-experts model developed by Meta, supporting 12 languages and image understanding, with 17 billion active parameters and a 10M context length.
Multimodal Fusion
Transformers Supports Multiple Languages

L
unsloth
2,492
1
Chitrarth
Other
Chitrarth is a multilingual vision-language model designed to connect vision and language, with a special focus on supporting multiple Indian languages.
Image-to-Text
Safetensors Supports Multiple Languages
C
krutrim-ai-labs
410
11
Paligemma 3b Pt 448
PaliGemma is a lightweight and versatile vision-language model built on the SigLIP vision model and Gemma language model, supporting multilingual image-text interaction tasks.
Image-to-Text
Transformers

P
google
2,708
29
Paligemma 3b Pt 224
PaliGemma is a versatile lightweight vision-language model (VLM) built upon SigLIP vision model and Gemma language model, capable of processing both image and text inputs to generate text outputs.
Image-to-Text
Transformers

P
google
38.40k
318
Featured Recommended AI Models